- 
                Notifications
    You must be signed in to change notification settings 
- Fork 448
[Feature] 添加LLM Token限流功能 | Add LLM Token Rate Limit #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] 添加LLM Token限流功能 | Add LLM Token Rate Limit #602
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds LLM (Large Language Model) Token Rate Limiting functionality to sentinel-golang, implementing two rate limiting strategies: Fixed Window and PETA (Predictive Estimated Token Allowance). The feature enables token-based rate limiting for LLM API calls with support for multiple token counting strategies and Redis-based distributed rate limiting.
Key Changes:
- Implements Fixed Window and PETA token rate limiting strategies with Redis backend
- Adds token encoding support (currently OpenAI) for estimating token usage
- Provides adapters for Eino framework integration
- Includes comprehensive example implementation with Gin middleware
Reviewed Changes
Copilot reviewed 69 out of 102 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description | 
|---|---|
| go.mod | Updates dependencies to support token rate limiting (Redis, tiktoken, copier, etc.) | 
| pkg/adapters/eino/wrapper.go | Implements LLM wrapper for Eino framework with token rate limiting | 
| pkg/adapters/eino/options.go | Defines configuration options for Eino adapter | 
| core/llm_token_ratelimit/*.go | Core implementation of token rate limiting logic, rule management, and strategies | 
| core/llm_token_ratelimit/script/*.lua | Redis Lua scripts for atomic rate limiting operations | 
| example/llm_token_ratelimit/* | Complete example with Gin server and LLM client integration | 
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the feature functionality is solid with no critical issues (as previously discussed). The rule loading mechanism still needs full alignment with Sentinel's existing datasource architecture. The proposed adjustments can be prioritized in future releases based on your schedule, but could you confirm your plan to proceed or defer them? This implementation is acceptable for the current preview/experimental release phase.
8cd4cfa
      into
      
  
    alibaba:ospp/llm-token-rate-limit
  
    
refers to: #596
性能测试报告